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Abstract 

Introduction: Human population-based genome-wide association (GWA) studies identify low penetrance breast 
cancer risk alleles; however, GWA studies alone do not definitively determine causative genes or mechanisms. 
Stringent genome- wide statistical significance level reguirements, set to avoid false-positive associations, yield 
many false-negative associations. Laboratory rats {Rattus norvegicus) are useful to study many aspects of breast 
cancer, including genetic susceptibility. Several rat mammary cancer associated loci have been identified using 
genetic linkage and congenic strain based-approaches. Here, we sought to determine the amount of overlap 
between GWA study nominated human breast and rat mammary cancer susceptibility loci. 

Methods: We queried published GWA studies to identify two groups of SNPs, one that reached genome-wide 
significance and one comprised of SNPs failing a validation step and not reaching genome- wide significance. 
Human genome locations of these SNPs were compared to known rat mammary carcinoma susceptibility loci to 
determine if risk alleles existed in both species. Rat genome regions not known to associate with mammary cancer 
risk were randomly selected as control regions. 

Results: Significantly more human breast cancer risk GWA study nominated SNPs mapped at orthologs of rat 
mammary cancer loci than to regions not known to contain rat mammary cancer loci. The rat genome was useful to 
predict associations that had met human genome-wide significance criteria and weaker associations that had not. 

Conclusions: Integration of human and rat comparative genomics may be useful to parse out false-negative 
associations in GWA studies of breast cancer risk. 



Introduction 

Breast cancer is a complex disease characterized by envir- 
onmental, genetic, and epigenetic factors. Due to the com- 
plexity of developing this disease a woman's individual 
risk may vary greatly from population risk estimates. The 
familial relative risk of developing breast cancer increases 
with the number of affected relatives, suggesting that there 
is a strong genetic component associated with this disease 
[1,2]. High-penetrance breast cancer risk mutations such 
as those of BRCA1 and BRCA2 have been identified [3,4]. 
Population frequencies of mutations with high-penetrance 
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toward risk are rare due to their severe effects on individ- 
uals and, thus, these mutations account for only a small 
percentage of population risk. Risk alleles with moderate 
penetrance and minor allele population frequencies of 
0.005 to 0.01 (for example, PALB2) are estimated to ac- 
count for approximately 3% of risk. Therefore, a majority 
of population-based breast cancer risk is likely explained 
by low penetrance alleles with rare to common population 
frequencies [5]. 

Genome-wide association (GWA) studies have been 
used to identify several low penetrance breast cancer risk 
alleles [6]. Due to a need to control for numerous mul- 
tiple comparisons made in GWA studies, a Bonferroni 
correction based P- value cut-off of <1 x 10" is typically 
required for an association to be considered genome- 
wide significant. It has been suggested that this approach 
is too stringent as it may result in many false negative 
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associations [7]. Furthermore, while GWA studies are 
unbiased approaches to identify genomic regions associ- 
ated with breast cancer risk, these epidemiology-based ap- 
proaches cannot easily determine risk genes or genetically 
determined mechanisms of susceptibility. Currently, only 
a small percentage of breast cancer heritability is ex- 
plained by published studies suggesting that considerable 
genetic variation associated with breast cancer risk re- 
mains to be identified [5,8]. 

Comparative genetics between rats and humans has 
also been used to identify breast cancer risk alleles [9]. 
In general, the laboratory rat is a good experimental or- 
ganism to model breast cancer. Compared to induced 
mammary tumors in mice, rats develop mammary car- 
cinomas of ductal origin, which is similar to a majority 
of human breast cancers. Also, rat mammary tumors are 
responsive to estrogen, just as are a majority of human 
breast tumors [10,11]. Most importantly, the laboratory 
rat is a versatile organism to study breast cancer suscep- 
tibility, as experiments can be controlled at genetic and 
environmental levels. Inbred rat strains exhibit differen- 
tial susceptibility to chemically induced carcinogenesis 
using 7,12-dimethylbenz[a] anthracene (DMBA) [10,12-14]. 
Copenhagen (COP) and Wistar- Kyoto (WKY) rat strains 
are resistant to DMBA, A/-Nitroso-A/-methylurea (NMU), 
and oncogene induced mammary carcinomas, while the 
Wistar- Furth (WF) rat strain is susceptible. 

Previous genetic studies using rats have identified 
eight Mammary carcinoma susceptibility (Mcs) loci, 
named Mcs 1-8 [15-18]. A (WFxCOP) F1 x WF back- 
cross design was used to identify Mcs 1-4. Copenhagen 
alleles at Mcs 1-3 are associated with decreased mam- 
mary tumor multiplicity, while the Mcs 4 COP allele is 
associated with increased tumor development [15]. Fur- 
ther analysis of the Mcsl locus using WF.COP congenic 
lines, spanning different regions of the quantitative trait 
locus (QTL), identified three independent loci associated 
with mammary carcinoma susceptibility, named Mcs la-c 
[17]. Another linkage analysis study using WF and WKY 
rat strains revealed four additional QTLs associated with 
mammary carcinoma susceptibility, named Mcs 5-8. Add- 
itionally, a modifier of Mcs8, Mcsml, partially counteracts 
the resistance phenotype conferred by Mcs8 [16,18]. Fur- 
ther analysis of the McsS locus using WF.WKY congenic 
rat lines resulted in the identification of four subloci 
named McsSal, McsSa2, McsSb and McsSc [9,19]. Add- 
itional linkage analysis using the SPRD-Cu3 rat strain 
(DMBA-induced mammary carcinogenesis susceptible) 
and the resistant WKY rat strain resulted in the identifica- 
tion of three more rat QTLs associated with mammary 
cancer named Mcstml, Mcstm2/Mcsta2 and Mcstal 
[20,21]. Several rat genomic regions that associate with 
mammary cancer susceptibility were identified using 
beta-estradiol instead of DMBA to induce carcinogenesis. 



These QTLs were identified using the August Copenhagen 
Irish (ACI) rat strain, which is susceptible to beta-estradiol 
carcinogenesis and the COP and Brown Norway (BN) rat 
strains, which are resistant. These loci are named Es- 
trogen-induced mammary cancer loci or Emca 1-2 and 
Emca 4-8 [22,23]. 

Comparative genomics between human breast and rat 
mammary cancer risk alleles will continue to be war- 
ranted, especially if appreciable overlap in genetic sus- 
ceptibility exists between these species. In this study, 
genomic locations of human breast cancer risk GWA 
study-identified polymorphisms were compared to the 
rat genome to determine if positive associations were 
more often located at orthologs to rat mammary cancer 
risk loci than at randomly selected regions not known to 
be associated with rat mammary cancer susceptibility. 

Methods 

Converting rat mammary cancer associated loci to human 
orthologous regions 

No research animals were used in this work. Previously 
published information on rat mammary cancer associ- 
ated loci was used. Human orthologous regions of rat re- 
gions that associate with mammary cancer susceptibility 
listed in Table 1 were determined using the 'In other 
genomes (convert)' function available at the University 
of California Santa Cruz (UCSC) genome browser [24]. 
Rat Nov. 2004 (Baylor 3.4/rn4) and human Feb. 2009 
(GRCh37/hgl9) genome assemblies were used. If a rat 
mammary cancer locus split into multiple human 
orthologous regions, we noted all orthologous regions 
until they reached less than 1% of the bases and 
spanned less than 1% of the original rat mammary can- 
cer locus using the UCSC genome browser. 

Random rat regions 

To determine if human GWA study-identified polymor- 
phisms map to rat mammary cancer loci more fre- 
quently than to random regions of the rat genome, we 
selected rat genome segments that have not shown an 
association with mammary cancer risk. These rat gen- 
omic regions were named 'random rat regions' and are 
listed in Table 2. We initially focused on fourteen Mcs/ 
Mcsm regions with an average size of 22,322,710 bps as 
these are generally smaller in size than other rat mam- 
mary cancer associated loci identified. Fourteen random 
rat genome regions, each 22,322,710 bps in size were 
used for comparison. Random rat regions were selected 
by picking a chromosome using a random number gen- 
erator function of Microsoft Excel. The range of chro- 
mosomes entered into the random number generator 
function was 1 through 21 (rats have 21 chromosomes, 
including a sex chromosome). The start position for 
each random rat region was determined using a random 



Table 1 Location of rat mammary cancer susceptibility loci and human orthologous regions used in this study 



Rat Mcs locus (Overlap) 



Boundary markers 



Rat chr Region(UCSC rat assembly 2004) 



Reference 



Human orthologous region 
(UCSC human assembly 2009) 



DMBA induced mammary carcinogenesis 
Mcs 1a 
Mcslb 
Mcslc 



Mcs2 (overlaps Mcs6, Emca4) 



Mcs3 



Mcs4 



Mcs5a1 (overlaps Mcstml, EmcaS) 
McsS o2(overlaps Mcstm 1, EmcaS) 
/Wcs5b(overlaps Mcstm 1, EmcaS) 
/Wcs5c(overlaps Mcstml, EmcaS) 

Ato6(overlaps Mcs2) 
yWcs7(overlaps Mcstal) 
Mcs8 

Mcsm 7(overlaps EmcaT) 



D2Mit29 to D2Uwm14 
ENSRNOSNP2740854 to g2UI2-27 
D2M13Mit286 to D2Uia5 

D7rat39 to D7Uwm12 



D1Rat27 to D1Mit12 



D8Rat164 to D8Rat1 



SNP-61 634906 to SNP- 61 66691 E 
SNP-61 667232 to gUwm23-29 
gUwm50-20 to D5Got9 
gUwm74-1 to gUwm54-8 

D7Rat1 71 to gUwm64-3 
D1 OGotl 24 to gUwm58-1 36 
D14Mit1 to D14Rat99 
D6Mit9 to D6Rat1 2 



RN02 
RN02 
RN02 

RN07 



RN01 



RN08 



RN05 
RN05 
RN05 
RN05 

RN07 
RNO10 
RN014 
RN06 



5,601,528- 10,539,344 
42,364,155-44,195,382 
13,909,383- 20,666,092 

4,936,704-86,028,057 



90,282,174-156,954,117 



28,414,100-72,403,639 



61,634,727-61,666,739 
61,667,053-61,751,614 
65,498,190-67,464,050 
81,118,457-81,295,367 

22,382,725-55,384,873 
89,575,060-100,335,500 
12,386,493-26,416,791 
34,039,303-114,032,192 



Haag ef al. [1 7] 
DenDekker et al. [25] 
Haag ef al. [1 7] 

Sanders ef al. [18] 



Shepel ef al. [15] 



Shepel ef al. [15] 



Samuelson ef al. [9] 
Samuelson ef al. [9] 
Samuelson ef al. [19] 
Veillet ef al. [26] 

Sanders ef al. [18] 
Cotroneo ef al. [27] 
Lan ef al. [16] 
Lan ef al. [16] 



CM: 89,216,702-93,113,337 
CM: 54,816,178-57,003,049 
CM: 81,891,633-86,857,442 
CM: 86,171,198-86,251,067 
Chl2: 57,316,160-108,177,690 
ChrS. 97,242,984-1 15,650,989 
Chrl9: 281,161-2,497,331 
Chrl9: 15,059,910-15,808,112 
Chrl5: 80,282,370-102,265,870 
Chrl5: 25,574,935-28,567,541 
Chrll: 17,403,456-22,898,646 
Chrll: 74,958,193-89,350,902 
Chr 19: 48,799,986-5 1 ,92 1 ,95 7 
Chrl9: 28,701,413-30,656,003 
Chrll: 107,453,990-132,383,506 
Chrl5: 62,105,069-76,028,735 
Chrl5: 76,091,658-78,185,872 
Chrl5: 78,380,119-78,998,961 
Chr 155] ,349,646-5 1 ,942,505 
Chr9: 37,562,516-37,589,491 
Chr9: 37,590,988-37,654,512 
Chr9: 103,492,712-105,220,552 
Chr9: 118,231,525-118,416,951 
Chrl2: 72,033,141-72,033,263 
ChrU: 71,270,266-105,502,699 
Chrl7: 40,183,547-67,946,104 
Chr4: 65,556,457-81,559,483 
ChrU: 25,151,530-80,417,386 
Chr2: 334,41-18,603,019 
CM: 12,561,599-19,619,365 
CM: 107,770,320-111,916,436 



Table 1 Location of rat mammary cancer susceptibility loci and human orthologous regions used in this study (Continued) 



Mcstm 1 (overlaps Mcs5,Emcal, Emca8) 



D5 rat 124 to Pla2g2a 



RN05 



19,206,257-157,657,360 



Piessevaux ef al. [21] 



/Wcsfm2(overlaps Emca2) 



D18Wox8 to D18Rat44 



RN018 



32,458,819-86,863,412 



Piessevaux ef al. [21 



/Wcsfo7(overlaps Mcs7) 



D10Rat91 to D10Rat97 



RNO10 9,762,188-108,776,963 



Piessevaux ef al. [21 



/3-estradiol induced mammary carcinogenesis 

£mco7(overlaps Mcstm 1, EmcaS) D5Rat53 to D5Rat57 



Emco2(overlaps Mcstm2) 



D18Rat27 to D18Rat43 



RN05 103,677,474-155,121,024 



RN018 18,562,643-66,652,947 



Gould ef al. [22] 



Gould ef al. [22] 



£mco4(overlaps Mcs2) 



D7Rat44 to D7Rat1 5 



RN07 



66,201,980-107,428,439 



Schaffer ef al. [23] 



Chrl: 20,301,931-59,012,763 
Chrl: 59,1 19,520-67,602,141 
Chr9: 27,325,071-123,488,955 
Chr6: 87,792,854-1 00,245,025 
Chr8: 87,055,841-97,247,307 
Chr8: 58,994,818-62,700,945 
Chrl 8: 10,202,644-13,129,349 
Chr18: 41 ,356,963-54,1 58,1 13 
Chrl 8: 54,267,924-58,201,561 
Chrl 8: 66,912,039-78,010,606 
CM: 112,300,500-130,363,372 
CM: 142,780,151-147,624,793 
CM:1 47,647,1 96-1 50,1 76,352 
CM: 130,482,861-173,663,969 
CM: 177,530,539-180,675,650 
Chrl 7: 690,639-15,624,409 
Chrl 7: 16,916,926-20,222,700 
Chrl7: 25,525,650-78,247,249 
Chrl 6: 78,402-6,094,950 

Chrl: 23,607,020-59,012,763 
Chrl: 59,1 19,520-67,602,141 
Chr9: 17,037,252-27,300,264 
CM: 1 1 0,259,1 80-1 30,363,372 
CM: 137,224,929-147,624,793 
CM: 147,647,196-150,176,352 
Chrl 8: 10,202,644-13,129,349 
Chrl 8: 35,982,130-41,016,602 
Chrl 8: 52,597,120-54,158,113 
Chrl 8: 54,267,924-58,201,561 
CM: 127,805,417-128,786,719 
Chr8: 97,242,984-1 37,409,536 
Chrl2: 57,316,160-59,093,375 



Table 1 Location of rat mammary cancer susceptibility loci and human orthologous regions used in this study (Continued) 



EmcaS D3Rat227 to D3Rat210 RN03 41,054,012-171,063,335 Schaffer ef al. [23] Chr20: 1,746,912-62,907,504 

Chr2: 110,841,402-113,650,057 
Chr2: 159,530,076-188,395,371 
CM 7: 26,296,319-57,753,858 
Chr15: 32,905,485-34,664,466 
ChrlS: 34,933,152-51,298,144 

Emca6 D4Rat14 to D4Rat202 RN04 41,729,583-159,115,617 Schaffer ef al. [23] Chr7: 23,252,368-33,1 03,1 07 

Chr7: 1 1 5,026,301 -1 50,558,396 
Chr3: 88,756-12,883,445 
Chr3: 13,004,609-15,163,132 
Chr3: 64,01 8,604-75,322,61 2 
Chr3: 125,977,400-128,219,297 
Chr2: 68,713,643-89,165,869 
Chr4: 89,504,626-95,273,083 
Chr4: 120,978,632-122,320,931 
Chr12: 156,786-2,821,588 
ChrW: 43,277,230-46,218,580 

Emco7(overlaps Mcsm I) D6Rat68 to D6Rat81 RN06 2,802,670-1 1 1 ,967,837 Schaffer et al. [23] Chr14: 25,1 51 ,530-78,362,253 

Chr2: 33,441-35,642,893 
Chr2: 38,644,737-51,698,454 
Chr7: 12,561,599-19,619,365 
Chr7: 105,197,211-111,916,436 

Emco8(overlaps Mcs5, Mcstml, Emcal) D5Rat134 to D5Rat37 RN05 52,434,178-148,460,381 Schaffer et al. [23] Chr9: 6,756,013-27,300,264 

C/Vft 27,925,947-1 23,488,955 
Chr1: 33,159,021-59,012,763 
CM: 59,1 19,520-67,602,141 
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Table 2 Random rat genomic segments and human orthologous regions used in this study 


Rat Mcs locus 


Rat chr 


Region (UCSC rat assembly 2004) 


Human orthologous region (UCSC 
human assembly 2009) 


Random rat region 1 


RN09 


20,000,000-44,322,711 


Chr2: 97,158,323-106,71 1,249 
Chr6: 56,223,874-73,919,999 
Chr2: 189,182,486-189,878,065 
Chr13: 103,235,577-103,556,495 
Chr2: 128,848,569-129,254,860 


Random rat region 2 


RN015 


60,000,001-84322711 


Chr13: 53,226,266-74,878,291 
Chr13: 42,064,282-42,529,444 


Random rat region 3 


RN016 


68,621,246-92,943,956 


Chr13: 103,539,456-115,092,822 
Chr8: 36,627,241 -42,308,840 
Chr8: 638,582-6,693,649 
Chr8: 42,690,588-43,056,1 79 
Chr13: 52,753,969-53,050,606 


Random rat region 4 


RN09 


91,398,460-115,721,170 


Chr5: 98,385,946-1 10,062,886 
Chr18: 612,848-9,957,727 
Chr2: 240,340,012-242,806,427 


Random rat region 5 


RN013 


55,373,307-79,696,017 


Chr1: 169,844,936-194,938,667 


Random rat region 6 


RN011 


39,408,000-63,730,710 


Chr3: 95,108,010-1 18,895,417 


Random rat region 7 


RN017 


68,384,015-92,706,72 


Chr10: 138,740-22,530,353 
Chr1: 236,673,870-240,084,642 


Random rat region 8 


RN03 


12,585,543-36,908,253 


Chr2: 140,246,548-155,465,845 
Chr9: 123,526,091-129,443,210 


Random rat region 9 


RN019 


34,130,390-58,453,100 


Chr16: 66,968,878-90,107,058 
ChrlO: 33,502,588-35,153,585 
Chrl: 229,402,942-235,324,796 
Chr4: 150,548,912-150,855,848 


Random rat region 10 


RN012 


18,203,110-42,525,820 


Chr12: 1 10,503,298-120,870,994 
Chr12: 121,578,435-132,335,900 
Chr7: 66,878,689-71,941,664 
Chr7: 101,137,811-102,184,451 
Chr7: 99,995,220-1 00,350,71 2 
Chr7: 72,707,443-74,223,683 
Chr7: 75,027,443-76,145,496 


Random rat region 1 1 


RNO20 


30,416,373-54,739,083 


Chr6: 101,086,446- 116,620,662 
Chr6: 117,266,139-123,147,126 
Chr2: 109,065,537-109,613,060 
Chr6: 1 16,688,407-1 16,905,609 


Random rat region 12 


RN013 


955,085-25,277,795 


Chr18: 58,351,906-63,553,937 
Chr2: 124,758,685-125,682,595 


Random rat region 13 


RN01 


1,136,860- 25,459,569 


Chr6: 128,011,342-150,185,813 
Chr6: 123,315,387-124,317,854 


Random rat region 14 


RN02 


182,078,762-206,401,472 


Chrl: 107,259,608-154,441,176 
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number generator function of Excel. The rat genome is 
2.75 Gb in size [28] or 130,952,381 bp per chromosome 
if divided equally across chromosomes. Therefore, values 
for the rat genome start-position were chosen from 1 to 
130,952,381 using a random number generator. Follow- 
ing, 22,322,709 bps were added to each random start 
position to obtain the desired full length. The 14 random 
rat genome regions were then entered into the UCSC 
genome browser, and the human orthologous regions 
were determined using the 'in other genomes (convert)' 
function, as described above [24]. Randomly-generated 
rat genome segments were used as controls if the human 
orthologous segment did not contain a sequence that 
was also orthologous to a known rat mammary cancer 
associated locus. We also verified, using the UCSC gen- 
ome browser, that human orthologous regions to ran- 
dom rat regions were not located at human centromeric 
regions, as genetic variation in these chromosomal re- 
gions is underrepresented in GWA studies [29,30]. Total 
sizes and percentages of rat genome covered by rat 
mammary cancer loci and random genome regions used 
are in Table 3. 



an association to be considered statistically significant 
in a GWA study is 1 x 10' . This stringency is to pro- 
tect from false-positives due to multiple comparisons 
on a genome-wide scale. It has been argued that this 
low P-value requirement results in numerous false 
negative associations [7]. Therefore, we queried sup- 
plemental material of each published GWA study 
considered and picked polymorphisms that failed the 
validation stage in the respective study. We also in- 
cluded polymorphisms that did reach genome-wide 
significance. We considered 533 SNPs from studies 
that included populations of European descent, and 
285 SNPs from studies of non-European descent pop- 
ulations. All SNPs used in this analysis are listed in 
Additional file 1. Human genomic locations of poly- 
morphisms were found using dbSNP (GRCh37 assem- 
bly) [31]. These were compared to locations of the 
human orthologous regions of rat mammary cancer 
loci and random rat regions. If a polymorphism 
mapped to a region of interest, the name, location, 
odds ratio, 95% confidence interval, and P-value were 
noted. 



Determining human GWA study nominated 
polymorphisms 

Human breast cancer risk GWA studies considered were 
published through March 2013. In the first round of 
analysis we picked GWA studies with a clearly defined 
study population of subjects of European descent. In the 
second round of analysis, the defined population was 
broader and included studies that tested populations 
of non-European descent. Studies that included non- 
European descent populations were subdivided into 
respective populations used. The GWA studies evalu- 
ated are listed in Tables 4 and 5. Results from GWA 
studies used consisted of multiple stages (two to four 
stages) to evaluate breast cancer risk association. In 
our analysis, all SNPs that entered the final stage of 
their respective study were compared in the rat gen- 
ome. A tested SNP was called either 'associated' if it 
reached genome wide significance in its respective 
study or 'potentially associated' if it failed to meet the 
respective study statistical criterion following the final 
stage of analysis. Conventionally, a P-value level for 



Statistics 

The number of human polymorphisms that mapped to 
orthologous regions containing rat mammary cancer loci 
(observed) was compared to the number of human poly- 
morphisms that mapped to random rat regions (expected) 
using a chi-square analysis with one degree of freedom. 
Several rat mammary cancer loci overlap extensively and 
subsequently several human polymorphisms mapped to 
multiple rat loci. Currently, it is not known if these over- 
lapping rat mammary cancer loci would fine-map to the 
same locus or independent loci. For this study, human 
polymorphisms mapping to overlapping rat mammary 
cancer susceptibility associated sequences were counted 
only once. For analysis of associated (passed genome-wide 
significance level) versus potentially associated (did not 
pass genome-wide significance level) associations, a logis- 
tic regression was performed using SYSTAT 13 statistical 
software. A threshold of associated or potentially associ- 
ated was used as the independent variable and outcome 
was either the SNP mapped to a rat mammary cancer 
locus or it mapped to a random rat region. 



Table 3 Total size and percentage of rat genome covered by rat mammary cancer loci and random rat regions 



Region 


Loci 


Total size (bases) 


Total overlapping bases 


Total unique bases 


Rat genome portion (based 
on total unique bases) 


Mcs/Mcsm only 


14 


345,323,605 


33,002,148 


312,321,457 


1 1 4% 


All known ratmammary cancer loci 


24 


1,230,487,116 


325,386,323 


905,100,793 


32.9% 


Random rat regions 


14 


312,517,940 




312,517,940 


1 1 .4% 



Table 4 Breast cancer risk genome-wide association studies using populations of European descent 



GWAS 


Population 


Stages 


Cases/controls stage 1 


Cases/controls 
stage 2 


Cases/controls 
stage 3 


Cases/controls Study P-value cut-off 
stage 4 for significance 


Ahmcirl at rtt l~3 01 
fMllMCU ct ul. [dZ] 


European descent 


A 
4 


jyu/ jlh- 


3 QQ.n/3 QOQ 

j,yyu/ 3,yzo 


0,0/ 0/ J,yzo 


33 1 5/1 /3A 1/11 D ^ P 07 
jj ; I j4/00, I 4 I r <. t-U/ 


Antoniou er oi. [33] 
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Hunter er oi. [40] 


European descent 
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1,145/1,142 


1 ,776/2,072 




P < 2E-05 


Li et ol [41] 


European descent 


2 


617/4,583 


1,011/7,604 




P < E-05 


Li et ol [42] 


European descent 


2 


2,702/ 5,726 


? 




P < E-06 


Mavaddat er oi. [43] 


European descent 


2 


4,470/4,560 


? 




P<5E-02/6.25E-03 


Michailidou er ol [44] 


European descent 


2 


10,052/12,575 


45,290/41,880 




P<5E-08 


Murabito et oi. [45] 


European descent 


1 


250/1,345 






P<5E-08 


Sehrawat et oi. [46] 


European descent 


2 


348/348 


1,153/1,215 




P < 6.4E-08 


Stacey et oi. [47] 


European descent 


2 


1,600/11,563 


4,554/1 7,577 




P < E-07/P < 6.8E-08 


Stacey et a/. [48] 


European descent 


2 


6,145/33,016 


5,028/32,090 






Thomas et ol [49] 


European descent 


3 


1,145/1,142 


4,547/4,434 


4,078/5,223 


P<5E-07 


Turn bull er ol. [50] 


European descent 


2 


3,659/4,897 


12,576/12,223 




P < E-04 



Table 5 Breast cancer risk genome-wide association studies of non-European descent populations 



GWAS 


Population 


Stages 


Cases/controls stage 1 


Cases/controls 
stage 2 


Cases/controls stage 3 


Cases/controls 
stage 4 


Study P-value cut-off 
for significance 


Cai et al. [51] 


Asian descent 


4 


2,062/2,066 


4,146/1,823 


6,436/6,716 


4,509/6,338 




Chen et al. [52] 


African- American descent 


2 


3,153/2,831 


3,607/11,330 






P<5E-08 


Gold ef al. [53] 


Ashkenazi Jewish descent 


3 


249/299 


950/979 


243/1 87 




P < E-05 


Haiman et al. [39] 


African descent/ European 
descent 


2 


African descent (1,004/2,745), 
European descent (1,718/3,670) 


European descent 
(2,292/16,901) 








Kim et al. [54] 


Asian descent 


3 


2,273/2,052 


2,052/2,169 


1,997/1,676 




P<5E-04 


Long ef al. [55] 


Asian descent/ European 
descent 


3 


2,073/2,084 


4,425/1,915 


Asian descent (6,173/6,340), 
European descent (2,797/2,662) 






Long ef al. [56] 


Asian descent 


4 


2,918/2,324 


3,972/3,852 


5,203/5,138 


7,489/9,934 


P<5E-08 


Zheng ef al. [57] 


Asian descent 


3 


1,505/1,522 


1,554,1,576 


3,472/900 




P<5E-08 


Zheng ef al. [58] 


Asian descent 


2 


23,637/25,579 








P< 1.5E-03 
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Results 

Significantly more breast cancer risk GWA study 
nominated SNPs are located at orthologs of rat Mcs/Mcsm 
loci compared to random rat genomic regions 

We picked 28 GWA studies of breast cancer risk in which 
well-defined populations were analyzed (Table 4). Physical 
locations of polymorphisms that failed the final validation 
step and polymorphisms that reached genome-wide sig- 
nificance were determined using dbSNP [31]. We included 
SNPs that failed the final validation step of the respective 
study, because it has been suggested that many true asso- 
ciations are ruled out in a GWA study due to stringent 
statistical analysis methods [7]. We determined if se- 
quences containing these polymorphisms were located at 
either a human genome region orthologous to a known 
rat mammary cancer locus or to a randomly selected re- 
gion of the rat genome. Our goal was to determine if 
GWA study-nominated potentially-associated (did not 
pass final validation) and associated (genome-wide signifi- 
cant) risk polymorphisms, together map more often to 
human orthologous regions of rat mammary cancer 
susceptibility loci than to randomly selected rat gen- 
ome segments of similar size. If yes, it would suggest 
that human GWA information combined with rat gen- 
etic susceptibility information is broadly useful to de- 
termine true genetic associations. Overall, rat Mcs/ 
Mcsm loci are mapped to shorter genomic segments 
than other rat mammary cancer risk loci; therefore, we 
first compared overlap between human GWA study 
nominated breast cancer risk SNPs and rat Mcs/Mcsm 
loci to overlap of human associated SNPs with ran- 
domly selected rat genomic regions not known to con- 
tain mammary cancer susceptibility loci (Figure 1). 
Human GWA studies were grouped by population of 
descent for comparison. There was a significant differ- 
ence between the number of GWA study nominated 
SNPs mapping to rat Mcs/Mcsm loci compared to ran- 
dom rat regions in studies analyzing populations of 
European descent (66 SNPs to 51 SNPs respectively, 
-P-value <0.05). Human GWA study identified poly- 
morphisms located at orthologs of rat loci are listed in 
Additional file 2. This result supports previous studies 
indicating rat genetic susceptibility is useful to predict 
and study human breast cancer risk loci. There was no 
difference in Asian or African-American descent popula- 
tions. This is likely due to a limited number of published 
population-based breast cancer risk genetic-association 
studies using these populations. 

Breast cancer risk GWA study nominated polymorphisms 
map more often to orthologs of all known rat mammary 
cancer loci than to randomly selected regions 

Next, we included additional rat mammary cancer sus- 
ceptibility loci that have been identified, but span large 




combined 

Population 

Figure 1 Number of breast cancer risk GWA study nominated 
SNPs mapping to rat Mcs/Mcsm regions. Number of GWA study 
nominated SNPs mapping to orthologs of rat Mcs/Mcsm loci and rat 
random regions. Dark grey columns represent the number of GWA 
study nominated human SNPs mapping to the human orthologous 
regions of the Mcs/Mcsm loci. Light grey columns represent the 
number of GWA study nominated human SNPs mapping to the 
human orthologous regions of the random rat control regions. The 
difference between risk associated SNPs mapping to rat Mcs/Mcsm 
and random rat regions was statistically significant for European 
populations. Asterisk indicates P-value <0.05 using chi-square analysis 
with number of SNPs mapping to Mcs/Mcsm set as the observed 
value and number of SNPs mapping to random rat regions as the 
expected value. GWA, genome-wide association. 



genomic segments. Loci added were Mcstml, Mcstm2, 
Mcstal, Emcal-2 and Emca4-8 [20-23]. The same ran- 
dom rat genomic regions used previously were used in 
this analysis to be consistent. Respectively, 179 and 51 
GWA study nominated polymorphisms were located in 
human orthologous regions to rat mammary cancer loci 
and randomly selected rat regions (Figure 2A) when 
studies using populations of European descent were 
considered. This difference was statistically significant 
(P <0.01). Note, some rat mammary cancer loci identi- 
fied in independent studies have long regions of over- 
lap. Consequently, several human GWA study identified 
polymorphisms mapped to human sequence orthologous 
to overlapping rat susceptibility loci. As it is not known if 
these rat loci contain unique sub-loci, human risk associ- 
ated polymorphisms mapping to overlapping rat regions 
were counted only once. The size of the rat genome cov- 
ered by all known rat mammary cancer susceptibility loci 
compared to control loci was disproportionate (Table 3). 
However, the ratio of breast cancer risk associated human 
SNPs at orthologs to rat mammary cancer susceptibility 
loci to SNPs at random segments was higher than the ra- 
tio of susceptibility loci bases to random bases (3.5 versus 
2.9). This result was relatively proportionate to the previ- 
ous result when only rat Mcs/Mcsm loci were considered 
(1.29 for Mcs/Mcsm and 1.21 for all susceptibility loci), 
suggesting that a potential bias was not introduced by the 
increase in total genomic coverage. 
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combined 

Population 



B 




Association Threshold 

Figure 2 Number of breast cancer risk GWA study nominated 
SNPs mapping to orthologs of rat mammary cancer loci or 
randomly selected rat genomic segments. Dark grey columns 
indicate GWA study nominated SNPs that map to human 
orthologous regions of rat mammary cancer loci. Light grey columns 
indicate GWA study nominated SNPs that mapped to human 
orthologous regions of randomly selected rat genomic regions. 
A) Studies by population descent. Asterisks indicate statistical 
significance (P <0.01). The difference between risk associated 
SNPs mapping to rat mammary cancer loci and random rat 
regions in studies of European, Asian and African-American descent 
populations was significant (P-values <0.01 using chi-square analysis 
with the number of SNPs mapping to rat mammary cancer loci set as 
the observed value and the number of SNPs mapping to random 
rat regions as the expected value). B) Associated and potentially 
associated SNPs identified in populations of European descent 
that mapped to rat regions of interest were compared using logistic 
regression. Threshold of association was not a significant predictor of 
whether a SNP mapped to an ortholog of a rat mammary cancer locus 
or a random rat region, 'ns' indicates a comparison was not statistically 
significant. GWA, genome-wide association. 



Not surprisingly, only 179 of 533 or 33.6% of the total 
human GWA study identified SNPs using populations of 
European descent were located at orthologs to rat mam- 
mary cancer associated loci. It is notable that 57 of the 
533 total SNPs evaluated were reported in more than 
one GWA study; a majority of these were potential asso- 
ciations that failed the final validation step of the re- 
spective study. These results further suggest that there are 
several breast cancer risk associated SNPs not reaching 



genome-wide statistical significance in human population- 
based genetic studies. 

Since more breast cancer risk polymorphisms nomi- 
nated from GWA studies of populations of European 
descent mapped to orthologs of rat mammary cancer 
loci than to randomly selected regions of the rat gen- 
ome, we determined if this was the case for association 
studies using non-European descent populations. We 
queried the nine GWA studies of populations of non- 
European ancestry that are listed in Table 5. These were 
GWA studies using populations of African, African- 
American, Ashkenazi Jewish, and Asian descent; however, 
only polymorphisms from studies using African- American, 
Ashkenazi Jewish and Asian descent populations mapped 
to any of the human orthologous segments to rat genomic 
regions picked for this study. First, results from all studies 
of non-European descent populations were combined 
(Figure 2A). Eighty-nine risk associated SNPs mapped 
to orthologs of rat mammary cancer loci and 26 SNPs 
were located at randomly selected rat regions. Next, 
studies using populations of Asian, Ashkenazi Jewish 
and African-American descent were considered separ- 
ately. This resulted in 64 Asian descent population 
SNPs mapping to orthologs of rat mammary cancer 
loci and 18 SNPs to random rat regions. Twenty-four 
SNPs identified in studies of African- American descent 
populations were located at orthologs to rat mammary 
cancer loci and eight SNPs in random rat regions. The 
difference between rat mammary cancer loci and ran- 
dom regions was statistically significant (P <0.01) for 
both populations (Figure 2A). Interestingly, one SNP 
from a study of an Ashkenazi Jewish population mapped 
to the human orthologous region of rat Mcstal, but no 
GWA study nominated SNP from that study mapped to a 
rat random region [53] . The lack of human SNPs mapping 
to orthologs of rat mammary cancer loci from populations 
of African and Ashkenazi Jewish decent may be due to a 
limited number of studies conducted on these popula- 
tions. On the other hand, it may indicate that susceptibil- 
ity alleles different from those currently identified in 
laboratory rats are segregating in these populations. Out 
of 285 SNPs considered from studies using populations of 
non-European descent, 89 SNPs or 31% mapped to ortho- 
logs of rat mammary cancer loci (Additional file 2). Fifteen 
risk-associated SNPs were represented in more than one 
human GWA study. 

Next, GWA-study nominated variants from popula- 
tions of European descent were separated by associated 
(reached genome-wide significance) and potentially asso- 
ciated (did not reach genome-wide significance after the 
final stage) variants (Figure 2B). Nineteen associated 
SNPs were located at rat mammary cancer loci com- 
pared to seven SNPs that mapped to random rat regions. 
Comparatively, 160 potentially associated SNPs mapped 
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to rat mammary cancer susceptibility loci compared to 
44 SNPs that mapped to random rat regions. A logistic 
regression was performed using threshold of association 
(associated or potentially associated) as the independent 
variable and rat genome location (ortholog of a rat 
mammary cancer risk locus or a randomly selected 
locus) as the dependent variable. Threshold of associ- 
ation was not a significant effect (P- value = 0.54). This 
result, that both associated and potentially associated 
breast cancer risk variants map more often to orthologs 
of rat mammary cancer risk loci than rat regions not as- 
sociated with susceptibility, strongly supports that com- 
parative genomics between humans and rats may be an 
effective integrative approach to determine which poten- 
tial associations nominated by human association studies 
are true positives. 

Human populations have been studied more exten- 
sively for breast cancer genetic risk than have rat popu- 
lations; therefore, it is not surprising that human studies 
have yielded a considerable number of genome-wide sig- 
nificantly associated SNPs in alleles where it is not 
known if the rat genome contains a concordant allele. 
Interestingly, seven strongly associated human SNPs 
were in sequences orthologous to the randomly selected 
rat genome regions that are not known to associate with 
rat mammary cancer based on studies evaluating specific 
rat strains; thus, it is possible that a portion of the rat 
genome used in this study as rat random-genome con- 
trol regions may actually associate with unidentified rat 
mammary cancer susceptibility loci. Thus, more rat gen- 
omic regions associated with mammary cancer risk may 
be identified with additional rat genetic studies. To date, 
only six inbred rat strains have been used to identify rat 
genomic regions associated with mammary cancer risk 
[15,16,20-23]. Therefore, it is highly likely that more 
mammary cancer susceptibility loci may be identified by 
incorporating additional rat strains. It is also possible 
that more extensive analysis of previously studied rat 
strains may yield additional susceptibility loci by using a 
higher density of genetic markers for example. 

Twenty-one of the 24 known rat mammary cancer as- 
sociated loci are orthologous to human loci containing 
SNPs that are either associated or potentially associated 
with breast cancer risk. Fourteen of the known rat mam- 
mary cancer associated loci are orthologous to human 
risk alleles marked by GWA study nominated SNPs 
reaching genome-wide significance. Human GWA study 
designs do not definitively determine causative genes or 
mechanisms. The laboratory rat is a versatile experimen- 
tal organism to complement human studies of breast 
cancer. For example, inbred rat strains provide a model 
with reduced genetic variation that can be genetically 
manipulated and environmentally controlled. The over- 
lap between human breast and rat mammary cancer 



susceptibility associated loci suggests rats can be used 
extensively to study genetically determined mechanisms 
and environment interactions that will translate directly 
to human breast cancer risk and prevention. 

Human GWA study nominated breast cancer risk SNPs 
map similarly to rat mammary cancer associated loci 
identified using 7,12-dimethylbenz[a]anthracene or 
beta-estradiol 

Several rat mammary cancer loci used in this study were 
identified using DMBA to induce mammary tumors. These 
are Mcsla-c, Mcs2-4, McsSal, Mcs5a2, McsSb-c, Mcs6- 
Mcs8, Mcsml, Mcstml-2 and Mcstal. The remaining rat 
mammary cancer loci considered were identified using 
beta-estradiol to induce mammary carcinogenesis. Estradiol- 
associated susceptibility loci are Emcal-2 and Emca4-8. 
While DMBA is representative of environmental polycyclic 
aromatic hydrocarbons, this synthesized mammary carcino- 
gen is not found in nature. Conversely, estradiol is an en- 
dogenous environmental exposure associated with breast 
cancer risk. Human GWA study nominated SNPs mapping 
to orthologs of rat mammary cancer loci identified using 
DMBA were compared to those identified using beta- 
estradiol. We considered SNPs from all GWA studies, irre- 
spective of the population used. We noted that many 
DMBA and beta-estradiol identified rat mammary cancer 
loci overlap. In fact, seven of the fourteen DMBA associated 
rat mammary cancer loci overlap at least one beta-estradiol 
associated rat mammary cancer risk locus, and five of the 
seven beta-estradiol loci overlap rat mammary cancer loci 
identified using DMBA. To account for this overlap, human 
SNPs mapping to overlapping rat mammary cancer loci, 
one identified using DMBA and the other using beta- 
estradiol, were included once in the 'DMBA group and once 
in the 'beta-estradiol' group. These results are shown in 
Figure 3. A relatively similar number of GWA study 
nominated SNPs mapped to orthologs of rat mammary 
cancer loci that were identified using DMBA (181 
SNPs) and beta estradiol (146 SNPs). This suggests 
that different mammary carcinoma induction methods 
can effectively identify rat susceptibility loci relevant to 
human disease risk, and it also suggests that a plethora of 
carcinogenesis mechanisms may be genetically determined. 

Discussion 

It has been suggested that the use of Bonferroni-based 
correction procedures to protect against multiple com- 
parisons in GWA studies is too stringent and results in 
an abundance of false negative associations with little re- 
course to sort these from true-negative associations. 
Therefore, we considered associated and potentially as- 
sociated human SNPs from breast cancer risk GWA 
studies to determine if SNPs that failed validation and 
SNPs that reached genome- wide significance map to 
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estradiol 

Figure 3 Number of breast cancer risk GWA study nominated 
SNPs mapping to regions identified using DMBA or beta-estradiol. 

Number of GWA study nominated SNPs mapping to rat mammary 

cancer loci separated by method of mammary carcinogenesis 

induction. Slightly more SNPs mapped to orthologs of rat loci 

that were identified using DMBA than beta-estradiol. DMBA, 

7,1 2-dimethylbenz[a]anthracene; GWA, genome-wide association. 
\ ) 

respective regions of the rat genome known to associate 
with rat mammary cancer risk more often than to re- 
gions of the rat genome that are not known to associate 
with susceptibility. Results presented here indicate that 
the rat genome is useful to prioritize and rank human al- 
leles potentially associated with risk. The rat genome is 
useful regardless of the human population studied. Sig- 
nificantly more SNPs from GWA studies of populations 
of European, Asian, and African- American descent map 
to human orthologous regions of rat mammary cancer 
loci than to human orthologous regions of randomly se- 
lected rat genomic regions not known to associate with 
mammary cancer susceptibility. This supports the gen- 
eral idea that there are SNPs associated with breast can- 
cer risk that are missed due to conservative statistical 
methods used in GWA studies, and that the rat is useful 
to parse out important genetic variation in susceptibility 
to mammary carcinogenesis. 

Interestingly, we were unable to map GWA study nomi- 
nated SNPs to 3 of the 24 known rat mammary cancer 
loci. These were Mcsla, McsSal and McsSc. However, 
using a genome-targeted population-based genetic associ- 
ation study, a human SNP associated with breast cancer 
risk has been identified at human MCS5A1 [9]. The risk- 
associated SNPs at MCS5A1 are adjacent to a breast 



cancer risk-associated SNP at MCS5A2, which was identi- 
fied in two independent human population based studies 
[9,43]. Taken together, there is a high correlation between 
genetics of breast cancer susceptibility in humans and 
mammary cancer susceptibility in rats. Interestingly, there 
are several human genomic regions that are human GWA 
study nominated hotspots (for example, 19ql3, FGFR2) 
that are not known to have concordant rat orthologs. An 
explanation is that human breast and rat mammary cancer 
susceptibility are controlled by overlapping and non- 
overlapping genetic mechanisms. Another explanation is 
that there are rat genomic regions associated with mam- 
mary cancer risk yet to be discovered by using additional 
inbred strains, more extensive analysis of strains previ- 
ously studied, and different methods of carcinogenesis 
induction. 

Conclusions 

There is extensive genomic overlap between human 
breast and rat mammary cancer susceptibility. The rat 
genome may provide utility to identify true-positive as- 
sociations regardless of the human population used for a 
GWA study. The laboratory rat will continue to be an 
important model organism for researching genetically 
determined mechanisms of mammary cancer suscepti- 
bility that may translate directly to human susceptibility. 
An appreciable number of GWA study nominated SNPs 
not meeting genome-wide significance levels have gen- 
omic overlap with rat mammary cancer susceptibility 
loci. This supports the general idea that Bonferroni-based 
multiple-comparison correction procedures are too strin- 
gent and complementary approaches that integrate rat 
genomics would be highly efficacious to prioritize breast 
cancer risk associated alleles. 

Additional files 



Additional file 1: Table SI. List of GWAS nominated SNPs used in Analysis. 

Additional file 2: Table S2. Breast cancer risk associated 
polymorphisms from studies of European descent populations that map 
to rat mammary cancer loci and random rat regions. Table S3. Breast 
cancer risk associated polymorphisms from studies of non- European 
descent populations that map to rat mammary cancer loci and 
random rat regions. 
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